Confederate Memorials vs. Lynching Locations

In our country's current political climate, there has been a major push to remove many of the Confederate War Memorials in our country as they symbolize the systemic racism which has been perpetrated since the Civil War. These monuments have obviously emboldened racists across the country to commit heinous acts for their cause, but do these monuments act ubiquitously or does each monument do its own part in helping racism? In other words, is it because these monuments are all over the country that racist acts like lynchings occurred so frequently during the post-Civil War/Jim Crow era or does each specific monument do an equal part in facilitating these events?

In this tutorial, we will be looking into different ways to organize our data and draw conclusions from the locations of Civil War Memorials (an overwhelming amout of which are Confederate) and the locations of recorded lynchings throughout history (1877-1950). First, we want to organize all this data in a digestible form.

Part I : Data Tidying

This csv file contains information on every city in the United States. However, since our data for lynchings only has locations in state and county, we won't be using the city names. The useful thing this dataset does is give the latitude and longitude for every county in the US as well as the county FIPS, a unique identifier for state counties.

In [232]:
import pandas as pd
import folium

cities = pd.read_csv('uscitiesv1.3.csv')
cities.head()
Out[232]:
city city_ascii state_id state_name county_name county_fips zip lat lng population source id
0 Dunseith Dunseith ND North Dakota Rolette 38079 58329 48.813057 -100.060968 773.0 polygon 1840000000
1 Mylo Mylo ND North Dakota Rolette 38079 58353 48.635278 -99.618756 20.0 polygon 1840000001
2 Antler Antler ND North Dakota Bottineau 38009 58711 48.970853 -101.282380 27.0 polygon 1840000002
3 Bottineau Bottineau ND North Dakota Bottineau 38009 58318 48.827230 -100.445698 2211.0 polygon 1840000003
4 Gardena Gardena ND North Dakota Bottineau 38009 58318 58748 48.700837 -100.497639 29.0 polygon 1840000004

Next we can load in our data on the Civil War Memorials throughout the country. Here, the location is given by state and city, but again, this can easily be converted with our cities dataset.

In [233]:
cwm = pd.read_csv('CivilWarMamorials.csv', encoding = "ISO-8859-1")
cwm.head()
Out[233]:
State City NameOfArtifact Type Year CivilWarStatus
0 Alabama Anniston Major John Pelham Monument Monument 1905 Confederate State
1 Alabama Ashville Confederate Monument Monument/Courthouse 1923 Confederate State
2 Alabama Athens Confederate Soldier Monument Monument/Courthouse 1909 Confederate State
3 Alabama Beauregard Unincorporated Area Beauregard Municipality NaN Confederate State
4 Alabama Birmingham Confederate Soldiers and Sailors Monument Monument 1905 Confederate State

Finally, our last data set is of recorded lynchings between 1877 and 1950. The exact definition of a lynching has been historically <a href = https://www.smithsonianmag.com/smart-news/map-shows-over-a-century-of-documented-lynchings-in-united-states-180961877/>hard to define</a>, so datasets about lynchings can be inconsistent. This file, however, contains nearly 4000 documented lynchings in the area of our interest, so we have deemed it legitimate for this tutorial.

Most of the data in this set will not be used however. Basically we just want to know which counties the lynchings took place, as indicated in the LYNCHCOUNTY and LYNCHFIPS columns.

In [234]:
lynches = pd.read_csv('lynchingdata.csv')
lynches.head()
Out[234]:
STATUS YEAR MONTH DAY NAME ALTNAME1 ALTNAME2 ALTNAME3 RACE SEX AGE LYNCHCOUNTY LynchState LYNCHFIPS MOBCOUNTY MOBSTATE MOBFIPS ACCUSATION METHODOFDEATH
0 Lynching 1877 2 1? Frank J. Astor F. J. Astor Frank J. Aston NaN White Male NaN Arkansas AR 5001 Arkansas AR 5001 Unreported Hanged
1 Lynching 1877 2 2 Aaron Taylor NaN NaN NaN Black Male NaN Monroe GA 13207 Monroe GA 13207 Killing a young white man, son of a “most wo... Hanged
2 Lynching 1877 2 23 — Cage NaN NaN NaN White Male NaN Rapides LA 22079 Rapides LA 22079 Horse stealing and shooting a white man Hanged
3 Possible lynching 1877 3 1 — Morton NaN NaN NaN Black Male Old Harrison KY 21097 Harrison KY 21097 Causing the death of a neighbor Stabbed
4 Lynching 1877 3 13 Jim Walker NaN NaN NaN Black Male 17 Williamson TN 47187 Williamson TN 47187 Attempted murder and robbery of a white woman Hanged

Now we can start combining and tidying up these datasets to make them easier to use. We can merge the Civil War Memorials dataset with our cities data set so that we now have the county FIPS code for every memorial. Merging the two on state_name and city give us this mess of a dataframe. This will be greatly reduced within the next few steps.

In [235]:
cwm.rename(columns={"State": "state_name", "City": "city"}, inplace = True)
cwm = cwm.merge(cities, on=['state_name', 'city'])
cwm.head()
Out[235]:
state_name city NameOfArtifact Type Year CivilWarStatus city_ascii state_id county_name county_fips zip lat lng population source id
0 Alabama Anniston Major John Pelham Monument Monument 1905 Confederate State Anniston AL Calhoun 1015 36201 36206 36207 36205 36202 36204 36254 36257 33.659826 -85.831632 23106.0 polygon 1840013709
1 Alabama Ashville Confederate Monument Monument/Courthouse 1923 Confederate State Ashville AL Saint Clair 1115 35953 35987 33.837043 -86.254422 2212.0 polygon 1840013697
2 Alabama Athens Confederate Soldier Monument Monument/Courthouse 1909 Confederate State Athens AL Limestone 1083 35611 35613 35612 34.802866 -86.971674 21897.0 polygon 1840013542
3 Alabama Birmingham Confederate Soldiers and Sailors Monument Monument 1905 Confederate State Birmingham AL Jefferson 1073 35218 35214 35215 35217 35210 35211 35212 3521... 33.520661 -86.802490 212237.0 polygon 1840013733
4 Alabama Brewton Jefferson Davis Community College School 1965 Confederate State Brewton AL Escambia 1053 36426 36427 31.105178 -87.072192 5408.0 polygon 1840013890

Now we clean up the cities dataset to essentially turn it into a counties data set. We do this by removing the city name columns as well as any other columns we won't be using. Then we drop the duplicate county names and we are left with a much cleaner table.

In [236]:
counties = cities.drop(['city', 'city_ascii','zip', 'population','source','id'], axis=1)

counties.drop_duplicates(subset=['county_name','state_name'], inplace = True)
counties.head()
Out[236]:
state_id state_name county_name county_fips lat lng
0 ND North Dakota Rolette 38079 48.813057 -100.060968
2 ND North Dakota Bottineau 38009 48.970853 -101.282380
11 ND North Dakota Pembina 38067 48.687874 -97.667670
18 ND North Dakota Towner 38095 48.486668 -99.209859
23 ND North Dakota Cavalier 38019 48.630563 -98.704849

Now we go back to our two other datasets. By using the Dataframe.value_counts() method we are able to get the number of Civil War Memorials per county FIPS and the number of lynching per county FIPS. This returns a very simple table with the counts per county code, but it is not very readable at this stage. We can begin to see a possible relationship between lynching_count and CWM_count, but the county_fips does not communicate our data very clearly.

In [237]:
lynches.rename(columns={"LYNCHCOUNTY": "county_name", 'LynchState': 'state_id'}, inplace = True)
lynches = lynches.merge(counties, on=['county_name', 'state_id'])

counts_cwm = pd.DataFrame(cwm['county_fips'].value_counts()).reset_index().rename(columns={'index':'county_fips','county_fips':'CWM_count'})
counts_lynches = pd.DataFrame(lynches['LYNCHFIPS'].value_counts()).reset_index().rename(columns={'index':'county_fips','LYNCHFIPS':'lynching_count'})

counts_lynches['county_fips'] = counts_lynches['county_fips'].astype(int)
counts_cwm['county_fips'] = counts_cwm['county_fips'].astype(int)

# Use fillna() to set NaN to 0
x = counts_lynches.merge(counts_cwm, on=['county_fips'], how="outer").fillna(value=0)
x.head()
Out[237]:
county_fips lynching_count CWM_count
0 22015 39.0 11.0
1 22073 38.0 2.0
2 22017 33.0 2.0
3 28069 29.0 1.0
4 22105 26.0 0.0

To get our final dataframe, we do one last merge on the county_fips column. We now have a set of counties across the united states, their coordinates, the number of lynchings and the number of Civil War Memorials in each. We are finally ready to begin regression and visualization.

In [238]:
counties = counties.merge(x,on='county_fips')
counties.sort_values(inplace=True, ascending=False, by='lynching_count')
counties.head()
Out[238]:
state_id state_name county_name county_fips lat lng lynching_count CWM_count
215 LA Louisiana Bossier 22015 32.556262 -93.567119 39.0 11.0
470 LA Louisiana Ouachita 22073 32.515978 -92.191803 38.0 2.0
809 LA Louisiana Caddo 22017 32.580983 -93.892681 33.0 2.0
864 MS Mississippi Kemper 28069 32.767633 -88.650878 29.0 1.0
525 LA Louisiana Tangipahoa 22105 30.546302 -90.484812 26.0 0.0

Part II: Regression

For this next part, we will want to start graphing our data. The matplotlib pyplot module is excellent for simple, yet understandable graphs. We also will want to use the numpy library in order to make easy regressions. Lastly, we will be using the scipy library to get more concrete data on the regressions we make.

In [239]:
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats

Before we can graph our data, we need to extract the lynching counts and the civil war memorial counts into lists we can easily use later. Once we have those two pieces, we can use the numpy polyfit function to find the regression line which we will prepare for our graph using the poly1d function as the polyfit function simply outputs and equation for the regression.

In [240]:
x=counties['lynching_count']
y=counties['CWM_count']

fit = np.polyfit(x, y,1)
fit_fn = np.poly1d(fit)

Now that we've prepared our data and have our regression, we are ready to plot. For this, we will use the pyplot plot function which takes two lists (our x and y lists) as well as settings on how we want our data to be represented. For this, we make our points as yellow dots defined by 'yo' with y standing for yellow and o standing for dots. For our regression, we use a dotted line as defined by — and color it black as defined by k. For other formatting and argument manipulation informations check the plyplot plot function page.

In [241]:
plt.plot(x,y, 'yo', x, fit_fn(x), '--k')
plt.title('Total Lynchings 1877-1950 vs Confederate War Memorials Constructed')
plt.xlabel('Total Lynchings')
plt.ylabel('Confederate War Memorials Constructed')
plt.show()

Just from looking at the graph, we cannot see a real linear relationship between lynchings and confederate war memorials, but just for the sake of curiosity, we can use the scipy library's stat module and linregress function to find the exact slope, intercept, r-value, and p-value.

In [242]:
print(stats.linregress(x, y))
LinregressResult(slope=0.026719043740162322, intercept=1.3165559360961752, rvalue=0.056071955038894013, pvalue=0.07604602425049764, stderr=0.015044972969723274)

Based on this more concrete data, we can see that there is a slight positive slop and based on the fact that the p-value is below 0.1, this likely shows some kind of weak positive correlation between lynchings and confederate war memorials. It is possible that due to most of the counties having very few lynchings and confederate war memorials, the correlation is weaker and we might be able to see a stronger correlation from using only the counties with the most lynchings. In order to do this, we will graph the head of our sorted data frame (a sample of 30 was chosen as we felt it was not too small that it it only true outliers, but not so big that it incorporates many of the low lynching counties included in the full dataset). We also can run the scipy linregress function to see how the regression is affected.

In [243]:
fit2 = np.polyfit(x.head(30), y.head(30),1)
fit_fn2 = np.poly1d(fit2) 

plt.plot(x.head(30),y.head(30), 'yo', x.head(30), fit_fn2(x.head(30)), '--k')
plt.title('Total Lynchings 1877-1950 vs Confederate War Memorials Constructed; Top 30 Values by Lynchings')
plt.xlabel('Total Lynchings')
plt.ylabel('Confederate War Memorials Constructed')
plt.show()

print(stats.linregress(x.head(30), y.head(30)))
LinregressResult(slope=0.14423388111762375, intercept=-0.92603114099751815, rvalue=0.24385174864531906, pvalue=0.19408643371333512, stderr=0.10840521786249344)

Based on the result of the graph and the linear regression, we can see that the slope of the regression for the high lynching counties is much higher, but our uncertainty is higher as well as you can see from the p-value being much higher. Due to these limitations, it might be better to try to fit the full data for a non-linear regression. In order to do that, we simply have to change the last argument in the polyfit function from 1 to 2. This make our regression quadratic.

In [247]:
fit3 = np.polyfit(x, y, 2)
fit_fn3 = np.poly1d(fit3) 

plt.plot(x,y, 'yo', x, fit_fn3(x), '--k')
plt.title('Total Lynchings 1877-1950 vs Confederate War Memorials Constructed')
plt.xlabel('Total Lynchings')
plt.ylabel('Confederate War Memorials Constructed')
plt.show()

From this regression, we see a much more positive relationship that seems to fit that data better. Due to our large dataset (~1000 indices), we should be able to fit lines with higher orders to it without much issue. We can do this by increasing our last argument in the polyfit function more as we will do below.

In [248]:
fit4 = np.polyfit(x, y, 3)
fit_fn4 = np.poly1d(fit4) 

plt.plot(x,y, 'yo', x, fit_fn4(x), '--k')
plt.title('Total Lynchings 1877-1950 vs Confederate War Memorials Constructed')
plt.xlabel('Total Lynchings')
plt.ylabel('Confederate War Memorials Constructed')
plt.show()

fit5 = np.polyfit(x, y, 5)
fit_fn5 = np.poly1d(fit5) 

plt.plot(x,y, 'yo', x, fit_fn5(x), '--k')
plt.title('Total Lynchings 1877-1950 vs Confederate War Memorials Constructed')
plt.xlabel('Total Lynchings')
plt.ylabel('Confederate War Memorials Constructed')
plt.show()

fit6 = np.polyfit(x, y, 10)
fit_fn6 = np.poly1d(fit6) 

plt.plot(x,y, 'yo', x, fit_fn6(x), '--k')
plt.title('Total Lynchings 1877-1950 vs Confederate War Memorials Constructed')
plt.xlabel('Total Lynchings')
plt.ylabel('Confederate War Memorials Constructed')
plt.show()

fit7 = np.polyfit(x, y, 100)
fit_fn7 = np.poly1d(fit7) 

plt.plot(x,y, 'yo', x, fit_fn7(x), '--k')
plt.title('Total Lynchings 1877-1950 vs Confederate War Memorials Constructed')
plt.xlabel('Total Lynchings')
plt.ylabel('Confederate War Memorials Constructed')
plt.show()
/opt/conda/lib/python3.6/site-packages/numpy/lib/polynomial.py:583: RuntimeWarning: overflow encountered in multiply
  scale = NX.sqrt((lhs*lhs).sum(axis=0))
/opt/conda/lib/python3.6/site-packages/ipykernel_launcher.py:28: RankWarning: Polyfit may be poorly conditioned

As you can see from the graphs, we are able to match the data better and better by using a higher order fit with our 5th order regression fitting the data fairly well. This also shows the dangers of overfitting as even Jupyter is warning us that using 1000 points for a 100 order regression is likely to be overfit and poorly conditioned.

Part III: Visualization

For this section, we are going to walk through a few simple visualizations for our location based data. To do that, we will be using the library folium to plot our points on a map. The first one will be very simple: we can plot the location of every lynching compared to the location of every Civil War Monument. Arbitrarily, lynchings will be plotted as blue points and Civil War Monuments will be in red.

In [250]:
import random

map_osm = folium.Map(location=[39.50, -98.35], zoom_start=4)


lats = counties['lat']
longs = counties['lng']
lynchings = counties['lynching_count']
cwms = counties['CWM_count']

for lat,long,cwm,lynch in zip(lats, longs, cwms, lynchings):
    if (cwm != 0):
        folium.CircleMarker([lat, long],
                    radius=1.5,
                    color='#0000ff', #blue
                   ).add_to(map_osm) 
    if (lynch != 0):
        folium.CircleMarker([lat, long],
                    radius=1,
                    color='#ff0000', #red
                   ).add_to(map_osm)     

    
map_osm
Out[250]:

As we can se, there is a slight indication that lynchings occured nearby the Civil War Memorials, but the map gives little indication on the strength of this relationship. To learn more, we can create larger circles for counties with more of either count. First, we can see the distribution of the memorials across the United States

In [251]:
map_osm2 = folium.Map(location=[39.50, -98.35], zoom_start=4)


lats = counties['lat']
longs = counties['lng']
lynchings = counties['lynching_count']
cwms = counties['CWM_count']

for lat,long,lynch,cwm in zip(lats, longs, lynchings, cwms):
###### CIVIL WAR MONUMENTS IN RED ######
    if (cwm > 30) :
        folium.CircleMarker([lat, long],
                    radius=10,
                    color='#ff0000', #red
                    fill_color='#ff0000',
                    fill = True,
                   ).add_to(map_osm2)
    
    elif (cwm > 20) :
        folium.CircleMarker([lat, long],
                    radius=5,
                    color='#ff0000', #red
                    fill_color='#ff0000',
                    fill = True
                   ).add_to(map_osm2)
    elif (cwm > 10) :
        folium.CircleMarker([lat, long],
                    radius=3,
                    color='#ff0000', #red
                   ).add_to(map_osm2)
        
    elif (cwm >= 1) :
        folium.CircleMarker([lat, long],
                    radius=1,
                    color='#ff0000', #red
                   ).add_to(map_osm2)

        
map_osm2
Out[251]:

Now that we have an idea of where the most memorials are located, we can plot the lynchings and see if there is a visible relationship.

In [252]:
               
        
for lat,long,lynch,cwm in zip(lats, longs, lynchings, cwms):        
###### LYNCHINGS IN BLUE ######
    if (lynch > 30) :
        folium.CircleMarker([lat, long],
                    radius=10,
                    color='#0000ff', #blue
                    fill_color='#0000ff',
                    fill = True
                   ).add_to(map_osm2) 
        
    elif (lynch > 20) :
        folium.CircleMarker([lat, long],
                    radius=5,
                    color='#0000ff', #blue
                    fill_color='#0000ff',
                    fill = True
                   ).add_to(map_osm2)
        
    elif (lynch > 10) :
        folium.CircleMarker([lat, long],
                     radius=3,
                    color='#0000ff', #blue
                   ).add_to(map_osm2) 
    elif (lynch >= 1) :
        folium.CircleMarker([lat, long],
                    radius=1,
                    color='#0000ff', #blue
                   ).add_to(map_osm2)
        
map_osm2
Out[252]:

Now our visualization tells a deeper story. There is evidence of larger blue circles nearby red circles, but there are some inconsistencies. For instance, The largest red circle by Washington D.C. has very few lynching dots nearby. Conversly, we see large blue circles near Louisiana, but there is not always a red dot nearby to "explain" it as we seeked to discover.

Conclusion

From our data, there does not seem to be a clear geographical correlation between lynchings and Civil War Memorials, however it is possible that stronger correlations can be drawn by using different definitions of lynchings. It may also be possible to find better correlations if there was more geographical information about where lynchings took place. Due to the nature of these lynchings, finer geographical data about each lynching is ephemeral in comparison to the same data for the Confederate War Memorials. For those interested in this topic, we urge you to look for more specific datasets to use in correlations between lynchings and Confederate War Memorials. We also urge you to look into the Peace and Equality Memorial which is opening in 2018 in Alabama as it uses some of the same data we used to create a physical work of art in order to commemorate those who were killed in the name of Civil Rights in America.

In [ ]: